Eventually-stationary policies for Markov decision models with non-constant discounting
نویسندگان
چکیده
We investigate the existance of simple policies in finite discounted cost Markov Decision Processes, when the discount factor is not constant. We introduce a class called “exponentially representable” discount functions. Within this class we prove existence of optimal policies which are eventually stationary—from some time N onward, and provide an algorithm for their computation. Outside this class, optimal policies with this structure in general do not exist.
منابع مشابه
Markov decision processes with exponentially representable discounting
We generalize the geometric discount of finite discounted cost Markov Decision Processes to “exponentially representable” discount functions, prove existence of optimal policies which are stationary from some time N onward, and provide an algorithm for their computation. Outside this class, optimal “N-stationary” policies in general do not exist.
متن کاملPersistently Optimal Policies in Stochastic Dynamic Programming with Generalized Discounting
In this paper we study a Markov decision process with a non-linear discount function. Our approach is in spirit of the von Neumann-Morgenstern concept and is based on the notion of expectation. First, we define a utility on the space of trajectories of the process in the finite and infinite time horizon and then take their expected values. It turns out that the associated optimization problem l...
متن کاملMarkov Decision Processes with General Discount Functions
In Markov Decision Processes, the discount function determines how much the reward for each point in time adds to the value of the process, and thus deeply a ects the optimal policy. Two cases of discount functions are well known and analyzed. The rst is no discounting at all, which correspond to the totaland average-reward criteria. The second case is a constant discount rate, which leads to a...
متن کاملOn the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ǫ at each iteration, it is well-known that one can compute stationary policies that are 2γ (1−γ)2 ǫ-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iter...
متن کاملNumerical Analysis of Non-Constant Discounting with an Application to Renewable Resource Management
The possibility of non-constant discounting is important in environmental and resource management problems where current decisions affect welfare in the far-distant future, as with climate change. The difficulty of analyzing models with non-constant discounting limits their application. We describe and provide software to implement an algorithm to numerically obtain a Markov Perfect Equilibrium...
متن کامل